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I.   NATURE  OF  THE  PROBLEM 

The  Bureau  of  Naval  Personnel  requires  many  mathematical 
models  for  accurately  predicting  the  structure  of  the  future 
force.   These  models  are  used  as  tools  to  aid  in  planning 
decisions.   Of  special  interest  is  the  problem  of  costing 
the  future  force  as  a  part  of  the  budget  submission 
procedure. 

The  Bureau  of  Naval  Operations  determines  the  personnel 
requirements  for  the  future  force  and  passes  these  to  the 
Bureau  of  Personnel  for  implementation.   These  requirements 
are  presented  in  the  form  of  quarterly  pay  grade  vectors, 
that  is,  the  number  of  people  required  in  pay  grade  E-l, 
E-2,  E-3,  .  ..,  E-9.   Since  the  amount  of  pay  received  is 
dependent  on  the  member's  length  of  service,  the  problem  of 
predicting  the  total  cost  of  the  force  becomes  complex. 

The  specific  problem  considered  was:   given  the  future 
size  of  the  force  by  pay  grade,  the  past  and  present  inven- 
tories, predict  the  total  annual  base  pay  of  the  force  for 
future  years . 


II.   THE  NAPPE  MODEL 

One  model  currently  used  for  this  purpose  is  the  Naval 
Personnel  Pay  Predictor  (Enlisted  Model) ,  referred  to  as 
NAPPE.   The  model  makes  use  of  a  data  base  consisting  of 
three  sets  of  quarterly  inventories  (pay  grade  by  LOS  Force 
matrices)  for  all  years  since  1957.   The  inventories  are  for 
United  States  Navy  (USN) ,  United  States  Naval  Reserve  (USNR) , 
and  Total  Navy  (TOTALNAV) ,  the  sum  of  the  two. 

The  procedure  is  to,  first,  predict  the  future  quarterly 
LOS  vectors  for  the  desired  number  of  years  into  the  future 
(up  to  ten) .   This  is  a  vector  of  the  total  number  of  people 
with  length  of  service  1,  2,  ...,  31  years.   The  methodology 
used  for  this  prediction  is  discussed  later  in  this  section 
and  in  Appendices  A  and  B.   The  LOS  vector  is  then  combined 
with  the  pay  grade  requirements  vector  to  get  the  predicted 
force  matrix.   A  discussion  of  this  procedure  is  also  included 
in  this  section.   The  cost  of  the  force  is  then  simply  the 
multiplication  of  the  straight  line  averaged  (between  suc- 
cessive quarters)  number  of  people  in  each  cell  of  the  matrix 
with  the  pay  scale  for  that  cell,  which  is  an  input  to  the 
model . 

A.   SMOOTHING  THE  LOS  VECTORS  OF  THE  INVENTORIES 

The  first  step  in  the  NAPPE  models  prediction  of  the  LOS 
vector  is  accomplished  by  a  subroutine  referred  to  as  SMOOTH 


(refer  to  Appendix  A  for  the  mathematical  model) .   Throughout 
the  discussion  of  SMOOTH  it  should  be  remembered  that  all 
calculations  applied  to  previous  years  data  are  made  inde- 
pendently for  the  population  (the  total  number  of  people)  in 
each  element  of  the  LOS  vector  (hence  referred  to  as  LOS  cell) 
and  the  transition  rates  from  one  cell  to  the  next,  computed 
for  all  three  data  bases.   The  transition  rate  is  simply  the 
proportion  of  the  population  in  cell  i  of  year  j  which  move 
to  cell  i+1  in  year  j+1.   The  methodology  is  basic  single 
exponential  smoothing  as  discussed  by  Brown  [1]  and  others. 

The  following  procedure  is  done  independently  for  each 
LOS  cell.   For  each  year  of  historical  data,  a  prediction  is 
made  based  on  exponentially  smoothing  the  data  up  to  that 
year  using  values  of  the  smoothing  constant  (hence  referred 
to  as  alpha)  of  .05,  .10,  ...,  .95.   For  each  year,  the  pre- 
dicted value  is  then  compared  with  the  actual  value  to  deter- 
mine which  value  of  alpha  would  have  given  the  best  prediction 
This  results  in  the  selection  of  an  alpha  for  each  LOS  cell 
for  each  year  of  data.   Consult  Appendix  A  for  the  exact 
procedure  and  forms  of  the  resulting  error  that  are  stored 
and  used  by  the  model.   "Best"  predictions  and  resulting 
errors  are  made  for  all  years  of  historical  data,  finally 
resulting  in  a  decision  for  the  "best"  alpha  for  predicting 
the  future. 

The  output  of  SMOOTH  consists  of  four  sets  of  LOS  vectors. 
For  each  year  of  historical  data,  there  is  a  prediction  based 


on  transition  rates  and  a  prediction  based  on  previous 
year's  cell  populations,  one  pair  based  on  the  TOTALNAV  data 
and  the  other  pair  on  the  sum  of  the  predictions  based  on 
USN  and  USNR  data.   Note  that  due  to  the  difference  in  the 
structure  of  the  USN  and  USNR,  the  sum  of  these  predictions 
may  be  different  than  the  prediction  based  on  their  sum. 

The  final  prediction  is  made  by  a  subroutine  referred 
to  as  ADJSMO  (refer  to  Appendix  B  for  the  model) .   ADJSMO 
considers  five  "methods"  for  prediction.   These  include  the 
four  outputs  from  SMOOTH  plus  a  weighted  average  of  these 
predictions.   This  weighted  average  is  formed  by  multiplying 
a  weight  (BWT)  times  the  average  of  the  two  transition  rate 
based  predictions  plus  the  complementary  weight  (1.0  -  BWT) 
times  the  average  of  the  two  population  based  predictions. 
This  calculation  is  made  for  values  of  BWT  of  .45,  .50,  ..., 
.95.   For  each  year  of  data  a  "best"  method  (of  the  five) , 
in  the  least  square  error  sense,  and  a  "best"  weight,  if  a 
weighted  average  method  was  chosen,  is  selected  for  predicting 
that  year.   The  absolute  sum  of  the  errors  of  the  "best" 
prediction  is  also  calculated  for  use  in  adjusting  the  final 
prediction.   This  adjustment  is  necessary  because  no  transition 
rate  if  available  to  predict  LOS  cell  1. 

Having  selected  a  "best"  method  and  a  "best"  weighting 
factor  based  on  the  last  year  of  historical  data,  the  model 
predicts  the  first  future  year  values  for  LOS  cells  2-31. 
At  this  point  the  model  calculates  the  average  (over  all 
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years  of  data)  proportion  of  the  total  population  that  was 
in  LOS  cell  1.   This  proportion  is  then  applied  to  the  total 
force  required  for  the  quarter  under  consideration  and  com- 
pared with  the  number  which  would  be  in  cell  1  given  the 
predicted  values  for  cells  2-31  and  the  required  total. 
Half  of  the  difference  in  these  two  values  is  then  allocated 
among  cells  2-31  according  to  the  total  absolute  error  dis- 
cussed above.   The  prediction  for  cell  1  is  then  the  differ- 
ence between  the  required  total  size  of  the  force  and  the 
predictions  made  for  cells  2-31. 

B.   GENERATING  THE  PAY  GRADE  BY  LOS  MATRIX 

The  pay  grade  by  LOS  matrix  is  calculated  using  a  method 
for  renormalizing  contingency  tables,  as  described  by  Mosteller 
[Ref.  3],   This  method  is  an  iterative  procedure  which  takes 
the  desired  marginal  totals  of  a  matrix  and  a  given,  or  base, 
matrix  of  the  desired  form  and  constructs  a  matrix  as  similar 
as  possible  to  the  base  matrix,  having  the  marginal  totals 
that  were  desired. 

Since  this  method  was  used  throughout  the  research,  a 
brief  discussion  of  the  procedure  follows: 
Let  A.  .   be  the  elements  of  the  base  matrix,  i  =  1,2,..., 9 

1' J  -i  =  1   ?        31 

J   —   X,^,...,-J-L 

Let  R.     be  the  desired  row  totals 
3 

Let  C.     be  the  desired  column  totals 

l 

9 

(1)  R\    =  I   A.         for  all  j 
3    i=1  I'D 
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D.  =  R./R!  for  all  j 

111  J 


A.'  .  =  D.  A.         for  all  i,j 


31 
C!  =   £   A?  .        for  all  i 


1    j=l 


i/D 


D!  =  C./C1  for  all  i 

X      11 


AV  .  =  D!  A!  .       for  all  i,j 


A.   .  =  AV  . 
i/D     i/D 


Return  to  step  (1)  . 

The  procedure  is  continued  until  the  row  and  column  totals 
converge  to  the  desired  totals. 

In  the  NAPPE  model,  the  marginals  are  the  given  pay 
grade  vector  and  the  predicted  LOS  vector.   The  base  matrix 
is  calculated  as  the  simple  average  of  the  last  twelve 
quarterly  historic  inventories. 
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III.   EXPERIMENTS  WITH  AND  CHANGES  TO  THE  NAPPE  MODEL 

Since  the  object  of  the  research  was  to  improve  the 
predictive  accuracy  of  the  NAPPE  model,  the  sources  of  error 
had  to  be  determined.   It  appeared  that  there  were  two  inde- 
pendent sources  of  error,  predicting  the  LOS  vector  and  the 
instability  of  the  Mosteller  procedure  for  completing  the 
matrix.   Both  of  these  possible  problem  areas  were  studied. 
In  this  section  is  a  discussion  of  the  first  area  and  changes 
which  were  made  to  the  model  to  improve  its  predictive  quality 
This  is  followed  by  a  discussion  of  a  study  to  discover  the 
factors  which  influence  the  Mosteller  procedure. 

A.   MAJOR  SOURCE  OF  ERROR 

The  removal  of  either  of  these  above  mentioned  sources 
of  error  should  improve  the  predictions  of  the  model.   An  ad 
hoc  test  of  this  hypothesis  was  accomplished  by  using  the 
Mosteller  subroutine  (called  PNGPNG)  with  the  correct  LOS 
vector  and  comparing  the  results  of  the  model  with  known 
values,  for  years  with  historical  data  available. 

The  NAPPE  model  has  a  validation  feature  which  facilitates 
this  and  other  kinds  of  comparisons.   As  an  input  to  the 
model,  the  last  date  of  historical  data  to  be  used  is  given. 
The  model  then  only  looks  at  data  up  to  that  date  and  predicts 
as  if  that  were  today's  date.   Also  included  in  the  NAPPE 
package  (which  consists  of  several  minor  models  besides 


13 


NAPPE  itself)  is  a  model  called  NAPVAL.   This  model  compares 
the  NAPPE  output  with  the  actual  inventories.   These  compari- 
sons are. discussed  throughout  this  paper.   Specifically,  any 
number  called  "actual"  will  mean  an  output  from  NAPVAL.   Also, 
throughout  the  paper,  the  measure  of  effectiveness  for  com- 
parison will  be  the  total  annual  cost  of  the  force,  which 
is  an  output  of  both  models. 

In  order  to  accomplish  the  above  mentioned  objective, 
the  SMOOTH  and  ADJSMC  subroutines  were  removed  from  the  model. 
In  their  place,  the  actual  LOS  vector  was  read  from  the  inven- 
tories and  the  following  table  is  the  result  of  comparing 
the  prediction  based  on  this  procedure  and  the  prediction 
of  NAPPE.   The  elements  of  the  table  are  the  actual  cost 
(NAPVAL) ,  the  NAPPE  prediction  (with  the  model  untouched) , 
and  the  prediction  using  only  the  Mosteller  procedure  (NAPPE 
with  SMOOTH  and  ADJSMO  removed),  labeled  "Using  Actual  LOS". 

TABLE  I 

COMPARISON  OF  NAPPE  WITH  PURE  MOSTELLER 
(costs  are  in  millions  of  dollars) 


Year 

Actual 
Cost 

NAPPE 
Prediction 

% 
Error 

Using 
Actual  Los 

% 
Error 

1968 

1,832 

1,839 

.363 

1,832 

.014 

1969 

2,009 

2,016 

.372 

2,010 

.061 

1970 

2,280 

2,286 

.262 

2,282 

.081 

1971 

2,264 

2,258 

.290 

2,266 

.092 
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These  results  indicated  that  a  major  source  of  error  in  the 
model,  and  hence  a  potential  for  improvement,  results  from 
the  prediction  of  the  LOS  vector,  as  was  expected.   If  one 
looks  at  any  feature  of  the  enlisted  force  (such  as  size  or 
distribution) ,  he  finds  that  it  is  not  stationary  in  time, 
even  considering  statistical  fluctuations.   There  are  obvious 
trends.   During  war  years,  the  force  becomes  larger  and,  on 
the  average,  younger,  while  during  peace  time,  the  force 
becomes  smaller  and  older.   Since  single  exponential  smoothing 
does  not  allow  for  trends,  it  could  not  be  expected  to  handle 
the  problem  being  considered. 

However,  before  attacking  this  problem,  there  were  other 
questions  to  be  answered.   After  documenting  the  model,  two 
other  questions  came  to  mind.   Is  the  pure  complexity  of  the 
model  worth  the  computer  requirements?   (The  following  section 
indicates  not.)   Is  the  use  of  the  entire  data  base  justi- 
fied?  Intuitively,  the  answer  to  the  second  question  was  no. 
The  size  and  structure  of  the  force  in  the  late  1950's  is 
not  indicative  of  the  force  in  1975.   There  are  continual 
policy  changes  which  affect  enlistment,  promotion,  and 
retention. 

B.   SIMPLE  EXPONENTIAL  SMOOTHING 

In  order  to  answer  these  questions,  the  first  change  in 
the  model  was  made.  The  SMOOTH  and  ADJSMO  subroutines  were 
removed  and  a  subroutine,  SMOTHY,  replaced  them.   This 
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subroutine  used  simple  exponential  smoothing  of  the  transition 
rates  and  only  the  TOTALNAV  data  base. 
1.   The  SMOTHY  Subroutine 

This  subroutine  results  in  an  extensive  simplifica- 
tion of  the  NAPPE  model  as  it  uses  only  four  years  of 
historical  data  and  a  single  alpha  value  of  0.4,   This  alpha 
value  may  seem  very  large,  but  it  was  desired  to  make  the 
prediction  extremely  dependent  on  the  most  recent  data,  which 
is  the  most  significant.   The  actual  subroutine  is  included 
at  the  end  of  the  paper  but  the  simple  mathematical  model 
follows : 

Let  A.  .  ,     be  the  number  of  people  in  LOS  cell  k 
'  ■* '      in  quarter  j  of  year  i 

For  the  four  years  of  historical  data  calculate  the 

loss  rate  for  each  quarter  and  each  LOS  independently 


TR      =   i>jfk     i+l,j,k+l 

*■•*•*  Ai,j,k 


The  following  series  of  data  were  smoothed  for  prediction 

1,1, k    1,2, k'  i,4,kf    i+l, l,k' 

For  each  LOS  calculate  the  annual  loss  rate  for  prediction 

(P.  ,  )  using  single  exponential  smoothing.   The  procedure 
l ,  k 

is  to  iterate  through  four  quarters  of  data  which  results 

in  a  single  value  for  each  year.   The  superscript  (j) , 

j  =  1,2,3,4,  is  used  to  indicate  the  intermediate  steps  of 
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the  procedure  but  need  not  be  carried  once  the  annual 
predicted  loss  rate  has  been  calculated. 


Where 


pl'k  ■  aTRi,i,k  +  (1-°  -  a)  pi-i,k 


P.  ,  ,   is  the  final  result  from  the  previous  year 

1""  -L  ,  K 


P^l   =    aTR.  .  .  +  (1.0  -  a)  pf^"1)  for  all  j  =  2,3,4 
i,k      1,3 ,k  i,k  J     '  ' 


i,k    i,k 


Note  that  only  one  loss  rate  prediction  is  made  for  each 
year.   This  means  that  seasonal  variations  in  the  loss  rate 
are  not  taken  into  account.   Another  approach  would  be  to 
predict  a  loss  rate  separately  for  each  quarter,  the  trade- 
off being  that  this  procedure  would  require  more  years  of 
data.   This  raises  the  question  of  whether  using  older  data 
which  takes  into  account  seasonal  variations  would  result 
in  a  better  prediction  than  not  using  this  older  data  but 
ignoring  the  seasonal  variation.   This  is  an  area  left  for 
further  study. 

The  prediction  is  now  made  for  each  quarter. 

Let  T.  .  .   be  the  prediction  for  year  i  (first  future 
'•*'         year)  quarter  j  and  LOS  k 
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T.  .  .  =  A.  .  .  .  ,(1.0  -  P.  J    for  all 
1,3,1c    i-l,3,k"l        irk     .  =  1/2r3,4 


As  with  the  NAPPE  model,  this  gives  predictions  for  LOS 
cells  2-31.   The  calculation  for  cell  1  was  done  in  a  manner 
similar  to  the  existing  NAPPE  model.   The  average  proportion 
of  the  total  force  in  cell  1  was  calculated  for  the  four 
years  of  data.   For  each  quarter  the  following  calculations 
were  made: 

Let  C1AV  be  the  number  which  would  be  required  in 
cell  1  calculated  by  taking  the  above  proportion 
of  the  total  force  requirement  for  the  quarter 
being  predicted. 

31 


Let  C1P  = 


Req  -   £   T.  .  ,    total  required  minus 
k=2   lOr*    the  sum  of  cens  2-31 


Half  of  the  difference  between  these  two  values  was 
then  allocated  among  cells  2-31  on  the  basis  of  the 
number  projected  for  that  cell. 
For  each  cell  calculate 


ftDJ       .  (C1AV  -  C1P)    Tj,j,k 
jl,  j  ,k         2       31 

Z   T.   .  . 
k=2   X'3'K 


T'      =  T       +  AD J 
irjfk     i/j,k      i/j/k 
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The  value  for  cell  1  is  then  the  difference  between  the 
total  required  and  the  sum  of  the  new  predictions  for  cells 
2-31. 

2.   Results  of  SMOTHY 

The  model,  as  described  above,  was  then  run  to 
obtain  one  year  predictions  for  the  last  ten  years.   The 
following  table  is  a  comparison  of  these  results  with 
outputs  from  NAPPE  for  the  same  time  periods. 

TABLE  II 

COMPARISON  OF  NAPPE  AND  SMOTHY  PREDICTIONS 
(costs  are  in  millions  of  dollars) 


Year 

Actual 
Cost 

NAPPE 
Prediction 

g. 
"o 

Error 

SMOTHY 
Prediction 

% 
Error 

1965 

1,362.1 

1,360.0 

.151 

1,358.1 

.209 

1966 

1,582.4 

1,591.6 

.501 

1,582.0 

.028 

1967 

1,720.7 

1,736.5 

.918 

1,737.9 

.997 

1968 

1,832.6 

1,838.8 

.337 

1,836.9 

.237 

1969 

2,009.2 

2,016.7 

.372 

2,015.7 

.322 

1970 

2,280.8 

2,286.8 

.262 

2,280.9 

.001 

1971 

2,264.8 

2,258.2 

.290 

2,256.2 

.379 

1972 

2,496.7 

2,480.7 

.643 

2,483.6 

.524 

1973 

2,683.6 

2,678.1 

.205 

2,678.3 

.196 

1974 

2,777.4 

2,773.6 

.138 

2,772.1 

.190 

Mean 

.3817 

.3083 

Mean  2 

.3221 

.2318 
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The  value  given  in  the  table  as  mean  is  the  mean  of  the 
absolute  errors  and  the  value  called  mean  2  is  the  same 
except  the  outlyer  (1967)  is  left  out  of  the  calculation. 
Leaving  this  value  out  is  not  unreasonable  when  the  events 
of  1967  are  taken  into  consideration.   It  was  during  this 
year  that  the  structure  of  the  force  saw  tremendous  change 
due  to  the  Viet  Nam  buildup.   Any  model  based  on  past  data 
cannot  predict  the  future  when  major  policy  decisions  make 
that  data  inappropriate.   This  is  a  point  where  the  analyst 
using  the  model  must  use  reason  when  looking  at  its  output, 
a  point  to  be  discussed  later. 

A  close  inspection  of  the  preceeding  table  yields 
some  surprising  conclusions.   Although  the  SMOTHY  model  does 
not  predict  uniformly  better,  it  does  significantly  better 
for  most  years.   This  suggests  that  the  complexities  of  the 
NAPPE  model  are  not  only  unnecessary,  but  have  a  negative 
effect. 

C.   DOUBLE  EXPONENTIAL  SMOOTHING 

The  most  important  hypothesis  tested  was  that  single 
exponential  smoothing  is  not  the  appropriate  tool  for  modeling 
a  time  series  which  appears  to  have  trends.   As  suggested  by 
Brown  [1],  Goodman  [2],  and  others,  higher-order  exponential 
smoothing  is  a  valuable  tool  for  modeling  time  series  with 
underlying  trends.   Since  the  time  series  under  consideration 
does  not  show  any  properties  which  would  indicate  anything 
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beyond  a  linear  trend,  only  double  smoothing  was  considered 
in  S MOTH 2 . 

1.   The  SM0TH2  Subroutine 

As  with  the  SMOTHY  subroutine,  the  SM0TH2  subroutine 
makes  use  of  only  four  years  of  historical  data  and  only 
the  TOTALNAV  inventories.   It  also  only  smoothes  the  loss 
rates.   These  loss  rates  (TR.  .  .  )  are  calculated  exactly 
the  same  as  in  SMOTHY,  and  the  loss  rate  to  be  used  for 
prediction  (D.  ,  )  is  calculated  as  described  by  Brown  [1] . 

1  ,K 

The  single  smoothed  portion  (P.  ,)  is  calculated  exactly  as 
before.   The  double  smoothing  term  is  calculated  by  simply 
smoothing  the  single  smoothed  value.   The  superscript  nota- 
tion is  again  used  for  the  four  iterations  exactly  as  used 
to  calculate  P .  ,  . 


sl]i  =  aPl'k  +  (1-°  -  a)  si-i,k 


sUk  =  aPl^k  +   (1-°  "  a)    si5k1'      for  a11  j  =  2'3'4 


i,k  i,k 


These  two  values  are  then  combined  to  pick  up  the 
linear  trend  and  result  in; 


D.  .  =  2  P.  .  -  S.  .  +  -r— p~ (P.  .  -  S.  ") 

i,k      i,k     i,k    1.0  -  a    i,k     i,k 
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Note  that  again  only  one  value  of  the  smoothed  loss 
rate  is  calculated  for  each  year  and  the  same  procedural 
question'  was  left  unanswered. 

The  prediction  is  then  made  for  each  quarter  into 
the  future  exactly  as  in  the  SMOTHY  subroutine 


T.  .  .  =  A.  n  .  .  , (1.0  -  D.  . ) 
1,3, k     i-l,j,k-lN        i,k' 


As  with  the  previously  discussed  models,  this  gives  a 
prediction  for  LOS  cells  2-31.   The  final  adjustments  used 
in  SMOTH2  are  exactly  the  same  as  used  in  SMOTHY. 
2.   Results  of  SM0TH2 

As  in  the  experiment  with  SMOTHY,  it  was  obvious 
that  the  most  recent  data  should  be  most  heavily  weighted. 
Therefore,  an  initial  value  of  0.4  was  used  for  alpha. 
Since  the  impact  of  trend  was  the  most  important  considera- 
tion of  the  research,  other  values  of  alpha  were  also  tried. 
The  following  table  is  the  result  of  these  tests.   In  order 
to  put  the  results  in  a  form  for  analysis,  the  actual  dollar 
values  were  not  tabled  but  only  the  percentage  errors.   The 
elements  of  the  table  are  the  percentage  error  from  the  actual 
total  cost  of  the  force  for  NAPPE,  NAPPE  with  the  SMOTHY 
subroutine  (these  values  are  the  same  as  Table  II) ,  and  for 
NAPPE  with  the  SMOTH2  subroutine  using  values  of  alpha  of 
0.2,  0. 3,  and  0.4. 
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TABLE  III 

COMPARISON  OF  ABSOLUTE  PERCENTAGE  ERRORS 
OF  NAPPE,  SMOTHY,  AND  SMOTH2 


Year 

NAPPE 

1965 

.151 

1966 

.501 

1967 

.918 

1968 

.337 

1969 

.372 

1970 

.262 

1971 

.290 

1972 

.643 

1973 

.205 

1974 

.138 

Mean 

.3817 

Mean  2 

.3221 

Mean  3 

.2801 

SMOTH2 

SMOTHY 

a=.  2 

a=.  3 

.209 

.242 

.223 

.028 

.015 

.027 

.997 

1.098 

1.158 

.237 

.196 

.098 

.322 

.133 

.033 

.001 

.205 

.076 

.379 

.443 

.423 

.524 

.510 

.527 

.19  6 

.171 

.164 

.190 

.132 

.068 

.3083 

.3145 

.2797 

.2318 

.2274 

.1821 

.1690 

.1563 

.0984 

a=.  4 
.215 
.018 
1.170 
.273 
.054 
.041 
.473 
.54  3 
.147 
.059 
.2993 
.2026 
.1153 


The  values  of  mean  and  mean  2  have  the  same  definition 
as  in  the  preceeding  table.   The  value  mean  3  was  calculated 
leaving  out  the  values  for  1967,  1971,  and  1972.   The  reason 
for  making  this  calculation  will  be  discussed  in  detail  later. 

The  first  overview  of  the  table  would  result  in  a 
conclusion  that  SMOTH2 ,  with  an  alpha  value  of  0.3  seems  to 
be  a  somewhat  better  model  on  the  basis  of  the  mean  alone. 
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However,  the  mean  is  not  the  only  significant  feature.   The 
most  important  observation  is  that  SM0TH2  predicts  extremely 
well  for  all  years  except  1967,  1971,  and  1972.   The  reason 
for  the  poor  prediction  in  1967  has  already  been  discussed. 
The  reason  for  the  poor  prediction  for  1971  and  1972  can 
be  explained  in  the  same  manner  except  that  the  force 
structure  was  moving  in  the  opposite  direction.   That  is, 
these  were  the  years  of  major  policy  changes  resulting  from 
the  end  of  the  Viet  Nam  War  and  the  shrinking  of  the  force. 

Because  of  the  linear  trend,  which  is  a  part  of 
SM0TH2 ,  it  must  be  expected  to  do  poorly  when  the  direction 
of  the  trend  changes.   This  means  that  SM0TH2  should  have 
more  difficulty  "turning  the  corner"  when  there  are  major 
policy  changes.   This  does  not  mean  that  it  is  a  poor  model, 
but  rather,  some  decision  rule  is  required  of  the  user  when 
this  occurs. 

D.   THE  MATRIX  GENERATION  PROCEDURE 

Although  incomplete  and  inconclusive,  a  study  of  the 
Mosteller  method,  as  used  in  this  model,  had  some  interesting 
results.   It  was  found  that,  in  general,  the  Mosteller  pro- 
cedure is  extremely  sensitive  to  the  base  matrix.   It  was 
found  that  changing  the  value  in  one  cell  of  the  base  matrix 
resulted  in  changing  the  values  in  virtually  every  cell  of 
the  output  matrix.   There  was  no  consistency  found  in  these 
changes.   The  surprising  result  was  that  the  changes  in  the 
output  matrix  were  sometimes  greater  than  the  initial  changes 
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in  the  base  matrix.   For  example,  changing  one  cell  value  in 
the  base  matrix  by  less  than  5%  could  result  in  changes  in 
the  output  matrix  of  greater  than  5%  in  some  cells. 

This  is  extremely  significant  when  considering  its  use 
in  this  model.   The  base  matrix  is  calculated  as  a  simple 
average  of  the  last  twelve  quarterly  inventories.   This  means 
that  seasonal  variations  in  the  force  structure  are  not 
taken  into  account  and  implies  that  a  better  base  matrix 
may  be  possible. 

A  hypothesis  was  made  that  a  better  base  matrix  could 
be  calculated  by  computing  some  very  rough  transition  rates 
from  one  cell  to  another  and  these  rates  applied  to  the  one 
year  previous  inventory.   Experiments  with  this  hypothesis 
showed  some  very  promising  results  but  were  inconclusive. 
Continued  study  in  this  area  may  be  of  considerable  value. 
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IV.   CONCLUSIONS 

The  NAPPE  model,  in  its  present  form  of  single  exponen- 
tial smoothing,  does  not  appear  to  be  the  appropriate  model. 
Single  exponential  smoothing  has  as  an  assumption  that  the 
time  series  is  basically  constant  in  time  and  the  difference 
from  the  mean  is  caused  by  some  random  noise.   This  does 
not  appear  to  be  the  case  with  LOS  populations  or  with 
transition  rates. 

A  recommended  change  to  the  model  is  to  remove  the 
complex  SMOOTH  and  ADJSMO  subroutines  and  replace  them  with 
the  SM0TH2  subroutine,  using  an  alpha  of  0.3.   It  should  be 
made  clear  to  any  intended  user,  however,  that  substantial 
changes  in  the  enlisted  force  management  would  not  be 
reflected  in  the  prediction.   The  modelling  approach  should, 
in  fact,  be  completely  revised  so  that  changes  of  this  magni- 
tude can  be  accounted  for.   Since  pay  grade  totals  are  used 
to  drive  the  force  structure,  the  model  is  aware  of  impending 
changes  in  direction.   This  information  is  not  currently 
being  used  in  loss  prediction  by  NAPPE. 

In  addition,  the  base  matrix  used  in  the  Mos teller  pro- 
cedure could  be  estimated  more  carefully.   Based  on  these 
preliminary  experiments,  this  could  result  in  a  much  better 
estimate  of  force  structure,  and  hence  a  more  accurate  budget 
prediction.   The  determination  of  LOS  cell  1  population  re- 
mains somewhat  ad  hoc,  as  does  the  choice  of  a  smoothing 
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constant  of  0.3.   While  this  study  has  demonstrated  that 
a  simpler  approach  to  the  prediction  can  be  successfully 
taken,  all  other  alternatives  have  not  been  thoroughly 
investigated. 
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APPENDIX  A 

This  appendix  is  a  rough  documentation  of  the  SMOOTH 
subroutine  in  the  current  NAPPE  model. 
Let 


A.  .  ,     be  the  actual  population  for  year  i 
'■''      quarter  j  and  LOS  cell  k 


For  each  LOS  k  =  1,  2,     .  ..,  30,  calculate  the  transition  rate 
for  each  year  and  quarter 


TR     =  if J 'k  "   i+1' J 'k+1 


For  each  year  i  =  1,  2,  .  ..,  NYR-1  (NYR  =  last  year  of 
historical  data)  and  each  a  =  .05,  .10,  ...,  .95,  calculate 


pl5  ■  TR1,1 


Let  n  =  4i  +  j     j  =  1,  2,  3,  4 


pln+l  =  aTRi,j  +  (1-°  "  a)  pln 


For  each  year  then  the  predicted  transition  rate  is 


PRED.^0    =  P1.  .  . 
1+2, a      4i+l 
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The  relative  error  in  this  prediction  is  then 
the  sum  of  the  differences  between  the  predicted 
and  actual  transition  rates  for  the  year 


ER 


i+2  ,  a 


4 
I 

j-l 


1+1,3     4i+l 


1  -  TR 


i+l,j 


Go  to  the  INLINE  subroutine  to  choose  the  best  a, 

INLINE  Subroutine 

For  each  year  i  =  1,  2,  ...,  NYR  find  the  best  a  for 

predicting  the  following  year. 

For  each  a  =  .05,  .10,  ...,  .95  calculate 


EMIN,  =  ER. 

1      i,a 


EMIN0  =  I    ERn 

2    l=±      l,a 


EMIN0  =   E   (ER,   ) 
3    l=1  l,a 


Select  the  a  which  gives  the  minimum  value  of  EMIN. , 

EMIN2,  EMIN3  and  call  them  a*,  a*,  a|. 

Calculate 


SE 


1  =   E   (ER1  a*] 
^        1=1  ■L'a1 
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SE0  =   E   (ER,   *) 


1=1 


l,a* 


SE   =   I   (ER    *) 


Within  the  summation 
here,  each  a*  is  the 
one  which  was  selected 
for  the  given  year 


Select  the  minimum  of  these  three  values  and  the 

* 
most  recent  a   for  that  method  is  the  a  to  be  used 

to  predict  the  following  year.   Call  this  value  a.  , 

For  each  year  i  =  1,  2,  .  ..,  NYR  and  each  quarter  of  the 
year,  make  the  prediction  based  on  the  transition  rate: 


T.  .  ,  =  A.  ,  .  ,   (1.0  -  PRED.    ) 
i,D, k    l-l, 3, k  i,ai 


Now  make  a  similar  prediction  based  on  population  in  each 

cell. 

For  each  year  i  =  2,  3,  ...,  NYR  and  each  a  =  .05,  .10, 

•  •  •  /    •  -7  D 


*    8   Al,4,k+1 


Let  n  =  4i+j    j  =  1,  2,  3,  4 


P1  .-  =  aA.  .,,.,  +  (1.0  -  a)  P1 
n+1     i,;j,k+l  n 
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PRED.^,    =  P  .  .  .. 
i+l,a      4i+l 


The  relative  error  in  this  prediction  is  then 


ER 


i+1, a 


4 
Z 

j  =  l 


Ai+l,j,k+l   P  4i+l 


i+1, j,k+l 


Go  to  the  INLINE  subroutine  to  choose  the  best  a.   For 
each  year  i  =  2,  3,  ...,    NYR  and  each  quarter  (the  same 
value  is  predicted  for  all  four  quarters  of  a  given  year) , 
make  the  prediction  based  on  cell  populations 


P.  ,  =  PRED. 
i,k       i,ai 
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APPENDIX  B 

This  appendix  is  a  rough  documentation  of  the  ADJSMO 
subroutine  in  the  current  NAPPE  model. 

Define  the  five  methods  or  techniques  used  in  the 
subroutine : 

1.  P(l).  .  =  T.  .,     this  is  the  predicted  value  for 

year  i,  quarter  j  calculated  as 
the  transition  rate  based 
prediction  from  SMOOTH  using 
the  TOTALNAV  data. 

2.  P(2) .  .  =  T.  .  ,     this  is  the  sum  of  the  predictions 

i / 3  if]fK 

from  SMOOTH  made  as  above  using 
USN  and  USNR  data. 

3.  P(3) .  .  =  P.  ,       this  is  the  predicted  value  for 

i  r  3  i  /-K 

year  i,  quarter  j  calculated  as 
the  population  based  prediction 
from  SMOOTH  using  the  TOTALNAV 
data. 

4.  P(4) .  .  =  P.        this  is  the  sum  of  the  predictions 

i  /  3  i  /-K 

from  SMOOTH  made  as  above  using 
USN  and  USNR  data. 
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5.   P(5)  .   •  =  BWT  P(1)  t    Hi)     +  _       P(3)  +  P(4) 

l,  3  2  2 

this  is  a  weighted  average  of 
1-4  where  the  value  of  BWT  is 
the  value  which  would  have 
predicted  best  for  the  previous 
year. 

For  each  LOS  k  =  2,  3,  .  ...,  31 

For  each  year  i  =  1,  2,  .  ..,  NYR  calculate  the  cumulative 
square  error  for  each  method  1=1,  2,  3,  4,  5. 


i    4    A.  .  ,  -  P(I)   .  2 

TEP(I)i  =   I    I       (  2"3^ 2LiJ") 

n=l  j=l       i/j/k 


For  each  year  calculate  the  cumulative  square  error 
for  all  values  of  BWT  =  .45,  .50,  ...,  .95 


i    4    A.  .  ,  -  X.  .  (BWT)  2 

ET2(BWT)i  =   E    Z   (  1 '  3  '* i^3 ) 

n=l  j=l        i/j/k 


where 


P(l)     +  P(2)   . 

X.   .  (BWT)  =  BWT  =J-J      ±J-J- 

l/D  2 


P(3)  .   .  +  P(4)  .   . 
+  (1.0  -  BWT)  1,I]  0 ^J- 
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Based  on  TEP  select  the  best  method,  I,  and  based  on 
ET2  select  the  best  BWT  which  will  then  be  used  to  calculate 
P(5)  for  the  following  year. 

Calculate  the  cumulative  absolute  error  for  the  entire 
period  using  the  method  which  was  selected  as  best  for  each 
year. 


NYR 


TER,  =   Z    £ 
K    L=l  S=l 


A.   .  .  -  P(I   ,) .   . 
i,3,k       n-1'  1,3 

A.   .  . 
i,3,k 


where  I   , 
n-1 

was  the  best 
method  for  that 
year. 


Make  the  initial  prediction  for  the  first  future  year 


^fYR+ljjjk+l  +  P(INYR)NYR+1,  j 


where  INYR  is 
the  method 
selected  best 
on  the  last 
year  of  data. 


Let  IPC  .  be  the  total  force  required 

Calculate  the  average  proportion  of  the  total 
force  in  cell  1  for  all  years  of  data,  call  it  PAV 
For  each  quarter  to  be  predicted,  calculate 
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C1AV  =  (PAV) (IPG) 


C1P  =  IPG-k!2  ^VR+l.j.k 


These  two  values  are  the  possible  predictions  for 
LOS  1.   C1AV  is  based  on  the  average  proportion  of  the  force 
in  cell  1,  while  C1P  is  simply  the  difference  between  the 
total  required  force  and  the  predictions  for  cells  2-31. 
Let 


C1ADJ  =  C1AV  j   C1P 


There  is  a  test  in  the  model  to  ensure  that  this  average 
is  between  the  values  which  would  have  been  calculated 
using  the  largest  and  smallest  proportions  of  the  total 
population  in  cell  1  over  the  entire  data  base. 
Take  the  difference  between  C1ADJ  and  C1P  and  allocate  it 
among  cells  2-31  according  to  the  total  error  which  was 
calculated  for  predicting  that  cell  using  the  best  method, 


*  (C1P  -  C1ADJ)TER   .A.   .  . 

A*      =  A       + *-l  i*]>k 

i/j/k    i,j,k     31 

X   TER,   ,  A.   .  . 
k=2     *~1   x'3'k 


Adjust  cell  1  from  these  values 
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*  31   * 

A.   .  ,  =  IPC   .  -   Z   A.   .  . 
i/D/1       1,3    k=1   i/D/k 


The  same  basic  procedure  is  used  for  predicting  additional 
future  years  (up  to  10)  and  the  value  of  continuing  the 
discussion  is  questionable. 
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